Why FT may be wrong to license its content for AI training

News publishers may be advised to hold their nerve in a game of chicken with the AI companies.

A picture of the ChatGPT login page on a computer screen and, in front of it, the chatbot itself on a phone screen. The picture illustrates a story about publisher and industry responses to the UK government's proposal to create an opt-out copyright regime in relation to AI data mining. — ChatGPT login and chatbot. Picture: Shutterstock

The Financial Times has joined a growing group of newsbrands to have licensed its content for use by artificial intelligence pioneer OpenAI. It joins companies like Axel Springer, Le Monde and Associated Press in doing so.

John Ridding, the FT’s CEO, points to a tension inherent in their deal. On the one hand, it positions the FT as a beneficiary, keeping it “at the forefront of developments in how people access and use information”. But he also acknowledges an underlying ethical concern. It’s right, he says, “that AI platforms pay publishers for the use of their material”. There are few media companies around right now who don’t welcome a bit of extra income.

Mathias Dopfner, CEO of Axel Springer, has also recognised the tension in the relationship between AI and media. At last week’s INMA congress in London, while expressing optimism that AI can help media companies innovate and work better, he called for a new legal framework to protect intellectual property, with one conference delegate telling him his deal was a “pact with the devil”. His comments point to the most important strategic issue which this technology poses for the news media.

Every Press Gazette reader will have played with AI by now. It’s pretty impressive. Generative AI systems like ChatGPT, Copilot, Gemini and others can generate convincingly human-like responses, in the form of words, pictures, movies, code, music and even imitation voices, to prompts that you give them.

Breathless hyperbole and a reckless pace of development is lining AI up as an inevitable part of the future of humanity. Buoyed by that belief, and the allure of a new tech sector emerging, perhaps worth trillions of dollars, vast amounts of money have been poured into accelerating the breakneck pace of development.

All of that value, though, is actually being created by you and me, and professional creators and publishers. AI systems “learn” from words, music, films and photos created by humans. Based on this, they produce their own output which appears similar.

They have done this by simply copying everything they can find from the internet and “mining” it to create fuel for the AI rocket ship. If you have ever published anything on the internet, the chances are your work is now integrated into the “intelligence” of numerous AI systems.

Tech platforms, of course, have form for this, and they have had help. Over two decades ago, copyright and other laws were relaxed specifically for the benefit of tech platforms. This led to the previous generation of trillion-dollar search and social platforms, which could be built without the inconvenience those laws would otherwise have created. Without the same legal accountability for what they publish, or its underlying intellectual property, scale and profitability were easy to achieve. Their rise has coincided with the ongoing decline of revenue and profitability in the news and other creative sectors.

AI platforms have followed the same playbook. They have had to; without all the content they feed on, much of AI is useless and therefore worthless. Yet, of the billions of dollars which have been spent on recruiting engineering talent, computer chips and cloud computing capacity, none has been reserved for, or paid to, the individuals and companies whose content has underwritten the entire value of AI systems.

This state of affairs has led, unsurprisingly, to squabbles and conflicts. Lots of them.

On social media, conference platforms, in closed-door negotiations, in legislatures and courts, in Europe and the US and most of the rest of the world too, the legal and ethical arguments are being played out. Consensus is very far from emerging. The stakes are high.

The future of AI, and the future of the commercial viability of human creativity, comes down to a simple question: is it OK to use someone else’s work, without their permission, to train an AI?

Copyright law says that you can’t copy someone else’s work without their permission. In the case of AI, it would seem pretty clear cut – AI training databases are made up of literally billions of copies of things.

But, on the other side, the law also has exceptions. There are certain conditions under which it’s legally acceptable to make and use copies without first getting permission. These exceptions vary, a bit, from one country to another. Usually, they’re pretty clear cut, but in the United States they’re wrapped up in the “Fair Use” doctrine, which is vague enough to need frequent interpretation by courts.

Entire legal careers have been devoted to arguing about Fair Use. As a non-American non-lawyer my best attempt to summarise is to highlight the key concept: is a certain use of someone else’s work fair or not? For example, a book review which includes quotes from the book in question. The use doesn’t damage the commercial opportunity for the book publisher; in fact, probably the opposite if it encourages more people to read the book. Given that AI companies intend their output to be a substitute for the original human-created work which trained them, I think they face a tough time making their case.

The AI companies seem to think so too. As the lawsuits which will decide this formally wend their way through the US and other courts, AI companies are out doing deals.

This is a small start, and probably sensible. If all their copying to date is judged not to have been legitimate, AI companies are going to have a hard time staying in business. If their future use of other peoples’ content needs to be negotiated, they’d better get a head start. Otherwise they’re making a big all-or-nothing bet that they will win in court, not just in the US but all around the world, where exceptions to copyright are much more clearly defined.

Why content publishers like the FT should ‘exercise caution’ in AI deals

In any event, as more and more of the content they scrape from the internet is, itself, generated by AI, the risk of “model collapse” means dependable sources will become ever more important, including to validate (“ground” in AI jargon) results and reduce the prevalence of fictitious, made-up information being presented as factual. This happens a lot – AI people refer to the phenomenon as “hallucinations”.

All this sounds a lot like good news on the horizon for content owners such as the beleaguered news industry. New sources of revenue have been elusive over the last couple of decades, and the old ones have been drying up. If AI is inevitable, at least getting some cash out of it can’t be a bad thing. No doubt that’s part of the reasoning behind some of the deals we have seen announced.

But most companies who control enough content to be interesting to AI companies are, or should be, exercising caution. Key players, such as newspapers, are wary of allowing AI to use their own content to make products which aim to displace them in the eyes of consumers. That would be a self-sabotaging act however much money they are offered.

They are no doubt also conscious that, despite the rather ill-defined future benefits of AI which we’re all promised, so far the risks have been much more visible. Students have used AI to cheat on tests. Other students have been falsely accused after AI tools designed to detect AI-generated content have made mistakes. Lawyers have shown up in court citing entirely fictitious, AI-generated, legal precedents. AI has been used to create faked pornographic images of children as well as celebrities, convincing fake voices are being used to scam people by phone.

The list of “unforeseen” consequences grows longer every day, alongside known issues like AI’s tendency to invent entirely fictitious “facts”. Yet the AI bandwagon rampages on.

Which means that now is probably a pretty bad time for anyone to license their content for AI training. Nobody knows where this is going to go, what value is being exchanged (and therefore what a good deal looks like), or the real nature or extent of the risks. These risks are certainly reputational and commercial as well as the societal risks we are all aware of.

Nobody owes AI companies a right to exist, certainly not the owners of the content they have appropriated and mined for their own benefit. The current legal, legislative and lobbying frenzy will need to play out a little bit longer and we can all hope that the lines will be re-drawn more fairly as a result, protecting the rights of everyone who wants a say in how their work is used by others.

In the meantime, the best option might be to just wait. Playing chicken can be an unnerving game but it’s the AI companies who will have to swerve in the end.